support virtual packages on generic git hosts (Gitea)#587
support virtual packages on generic git hosts (Gitea)#587ganesanviji wants to merge 5 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
Review FeedbackThanks @ganesanviji for adding Gitea support! The raw URL download approach is a good idea. A few issues need addressing: 1. API version change breaks GitLab (critical)Changing Options:
2. Virtual package detection too broad
3. Bare
|
danielmeppiel
left a comment
There was a problem hiding this comment.
As per previous comment
|
Hi @danielmeppiel , Thanks for review and I have addressed all the reviewed suggestions, 1. API version change breaks GitLab (critical)Addressed with the preferred approach. For non-GitHub/GHE hosts we now attempt If that returns a non-200 we fall through to API version negotiation, trying 2. Virtual package detection too broadWe did not use
The distinction is driven by a new 3. Bare
|
|
@danielmeppiel - Could you please review the changes and update is there any changes or explanation needed on these changes AS SOON AS POSSIBLE. It would be very helpful to include the gitea support in APM in next release to use. |
There was a problem hiding this comment.
Pull request overview
Adds broader support for installing virtual packages from non-GitHub Git hosts (with a focus on Gitea), by updating dependency parsing heuristics and expanding the downloader’s raw/API fetching logic, plus new regression tests around hostname classification and generic-host URL handling.
Changes:
- Add
is_gitlab_hostname()and use it during virtual-package detection to treat GitLab nested-group paths as repo paths by default. - Extend generic-host downloads with a raw URL attempt and API version “negotiation”.
- Add unit tests covering GitLab hostname detection, Gitea/generic URL parsing expectations, and generic-host download behavior.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_github_host.py | Adds tests for GitLab hostname detection. |
| tests/unit/test_generic_git_urls.py | Adds Gitea/generic-host virtual package detection regression tests. |
| tests/test_github_downloader.py | Adds tests for generic-host raw download + API version fallback behavior. |
| src/apm_cli/utils/github_host.py | Introduces is_gitlab_hostname() helper. |
| src/apm_cli/models/dependency/reference.py | Adjusts virtual-package detection and standard URL parsing behavior for generic hosts/GitLab. |
| src/apm_cli/deps/github_downloader.py | Adds generic-host raw fetch and API version candidate list changes. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 5
| elif is_generic_host: | ||
| has_virtual_ext = any( | ||
| any(seg.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS) | ||
| for seg in path_segments | ||
| ) | ||
| has_collection = "collections" in path_segments | ||
| # GitLab supports nested groups (group/subgroup/repo), so the full | ||
| # path is the repo -- no shorthand subdirectory splitting. | ||
| # Use https://gitlab.com/group/subgroup/repo.git for GitLab nested | ||
| # groups; shorthand subdirectory syntax is not supported for GitLab. | ||
| # All other generic hosts (Gitea, Bitbucket, self-hosted, etc.) use | ||
| # the owner/repo convention, so extra segments are a virtual subdir. | ||
| if has_virtual_ext or has_collection: | ||
| min_base_segments = 2 | ||
| else: | ||
| elif is_gitlab_hostname(validated_host): | ||
| min_base_segments = len(path_segments) | ||
| else: | ||
| min_base_segments = 2 | ||
| else: |
There was a problem hiding this comment.
_detect_virtual_package() still treats any generic FQDN path with >2 segments as a virtual package when there are no explicit virtual indicators (file extension or /collections/). That makes refs like gitea.myorg.com/group/subgroup/repo parse as repo_url=group/subgroup + virtual_path=repo, which contradicts the new tests and breaks nested-group repos on non-GitLab hosts. Adjust the generic-host heuristic so that (absent explicit virtual indicators) the full path is treated as the repo (i.e., do not infer a virtual subdir from extra segments), and require dict form (git+path) when users want subdirectory virtual packages on generic hosts.
See below for a potential fix:
# Generic FQDN hosts may support nested repository paths
# (for example, group/subgroup/repo). Do not infer a virtual
# subdirectory from extra segments unless the path has an explicit
# virtual package indicator. For generic hosts, subdirectory virtual
# packages should use the structured dict form with separate git+path
# fields instead of shorthand path splitting.
if has_virtual_ext or has_collection:
min_base_segments = 2
else:
min_base_segments = len(path_segments)
| class TestGitLabApiVersionNegotiation: | ||
| """API version negotiation: v1 -> v3 -> v4 for generic hosts.""" | ||
|
|
||
| def setup_method(self): | ||
| with patch.dict(os.environ, {}, clear=True), _CRED_FILL_PATCH: | ||
| self.downloader = GitHubPackageDownloader() | ||
|
|
||
| def test_gitlab_v4_reached_after_v1_and_v3_return_404(self): | ||
| """GitLab uses /api/v4/ -- negotiation must try v1, v3, then v4.""" | ||
| dep_ref = DependencyReference.parse("gitlab.myorg.com/owner/repo") | ||
| expected = b"gitlab file content" | ||
|
|
||
| side_effects = [ | ||
| _make_resp(404), # raw URL | ||
| _make_resp(404), # v1 | ||
| _make_resp(404), # v3 | ||
| _make_resp(200, expected), # v4 | ||
| ] | ||
| with patch.object(self.downloader, "_resilient_get", side_effect=side_effects) as mock_get: | ||
| result = self.downloader.download_raw_file(dep_ref, "skill.md", "main") | ||
|
|
||
| assert result == expected | ||
| urls = [c[0][0] for c in mock_get.call_args_list] | ||
| assert "/api/v1/" in urls[1] | ||
| assert "/api/v3/" in urls[2] | ||
| assert "/api/v4/" in urls[3] | ||
|
|
There was a problem hiding this comment.
These tests assume a GitLab fallback of '/api/v4/repos/{owner}/{repo}/contents/...', but that's not a valid GitLab API shape (GitLab uses /api/v4/projects/.../repository/files...). As written, this test suite will lock in behavior that won't work against real GitLab instances and may hide regressions. Either remove the GitLab framing (treat this as "try v1 then v3 for Gitea/Gogs") or update both implementation and tests to use GitLab's actual endpoints.
| class TestGiteaVirtualPackageDetection: | ||
| """Gitea-specific virtual package detection -- supplements TestFQDNVirtualPaths | ||
| and TestNestedGroupSupport with Gitea host fixtures and regression guards | ||
| for the len(path_segments) > 2 over-trigger.""" | ||
|
|
||
| # --- Must NOT be virtual (nested-group repo, no virtual indicators) --- | ||
|
|
||
| def test_three_segment_gitea_path_is_not_virtual(self): | ||
| """group/subgroup/repo on Gitea is a nested-group repo, not virtual.""" | ||
| dep = DependencyReference.parse("gitea.myorg.com/group/subgroup/repo") | ||
| assert dep.host == "gitea.myorg.com" | ||
| assert dep.repo_url == "group/subgroup/repo" | ||
| assert dep.is_virtual is False | ||
|
|
||
| def test_two_segment_gitea_path_is_not_virtual(self): | ||
| """Simple owner/repo on a Gitea host is never virtual.""" | ||
| dep = DependencyReference.parse("gitea.myorg.com/owner/repo") | ||
| assert dep.host == "gitea.myorg.com" | ||
| assert dep.repo_url == "owner/repo" | ||
| assert dep.is_virtual is False | ||
|
|
||
| def test_four_segment_generic_path_without_indicators_is_not_virtual(self): | ||
| """Deep nested groups without file extensions or /collections/ are not virtual.""" | ||
| dep = DependencyReference.parse("git.company.internal/team/skills/brand-guidelines") | ||
| assert dep.is_virtual is False | ||
| assert dep.repo_url == "team/skills/brand-guidelines" | ||
|
|
There was a problem hiding this comment.
The new expectations here (e.g., gitea.myorg.com/group/subgroup/repo and other deep FQDN paths are not virtual unless there are explicit indicators) don't match the current generic-host heuristic in DependencyReference._detect_virtual_package(), which still marks any >2-segment generic-host path as virtual. Unless the parsing logic is updated accordingly, these tests will fail and the feature will still mis-parse nested-group repos on generic hosts.
| # GitLab supports nested groups (group/subgroup/repo), so the full | ||
| # path is the repo -- no shorthand subdirectory splitting. | ||
| # Use https://gitlab.com/group/subgroup/repo.git for GitLab nested | ||
| # groups; shorthand subdirectory syntax is not supported for GitLab. | ||
| # All other generic hosts (Gitea, Bitbucket, self-hosted, etc.) use | ||
| # the owner/repo convention, so extra segments are a virtual subdir. |
There was a problem hiding this comment.
This PR changes how virtual packages are detected/handled for generic FQDN hosts (and introduces GitLab-specific nested-group behavior). The Starlight docs and the apm-guide usage doc currently document virtual package rules and the "dict form required when shorthand is ambiguous" note, but they don't describe the generic-host behavior being introduced here (e.g., whether subdirectory virtual packages are supported via shorthand on non-GitHub hosts, or require object form). Please update the relevant docs pages so users of Gitea/self-hosted hosts know which syntax is supported and when they must use the object form.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
Add support for installing virtual packages from self-hosted Git services like Gitea. Currently, APM only supports virtual packages (subdirectories) on GitHub. This change enables users with Gitea to install packages from subdirectories within repositories.
Changes:
DependencyReferenceto recognize subdirectory packages on generic Git hosts (any FQDN)/api/v3to/api/v1for better compatibility with Gitea and other Git servicesMore details about the changes:
✅ Change 1: Virtual Package Detection (reference.py)
Analysis: This only affects generic Git hosts, not GitHub. Allows subdirectory packages to be detected as virtual even without specific file extensions. Safe because:
GitHub uses separate logic path (is_generic_host = False)
Validation still requires package markers (apm.yml, SKILL.md, etc.) in the subdirectory
No impact on existing GitHub virtual file detection
✅ Change 2: Authenticated Raw Downloads (github_downloader.py)
Analysis: Improves private repo support. Safe because:
Only applies to generic hosts, not GitHub
Falls back to API if raw fails
Uses standard Authorization header format
✅ Change 3: API Endpoint Update
Analysis: Gitea uses /api/v1/, GitHub uses /api/v3/. Safe because:
GitHub still uses /api/v3/
Gitea API v1 is compatible for contents endpoint
Falls back gracefully if endpoint doesn't exist
Motivation:
Enterprise teams using self-hosted Git services (Gitea) cannot currently use APM to install packages from repository subdirectories. This is a significant limitation for organizations that don't use GitHub. These changes enable APM to work seamlessly across all Git hosting platforms.
Type of change
Testing
Tested locally
All existing tests pass
Added tests for new functionality (if applicable)